Functions and Formulations
==========================

Dataset preprocessing
----------------------

Assume an MSI dataset represented as a 3D array of dimensions :math:`H \times W \times N`, where :math:`H` and :math:`W` are the spatial dimensions of the tissue sample, and :math:`N` is the number of measured ions (*m/z*). Each ion image :math:`I_k \in \mathbb{R}^{H \times W}` captures the spatial intensity distribution of the :math:`k^{\text{th}}` *m/z* value across the sample, for :math:`k = 1, \dots, N`. Conversely, each pixel location :math:`(i, j)`, with :math:`i = 1, \dots, H` and :math:`j = 1, \dots, W`, corresponds to a spectrum :math:`S_{i,j} \in \mathbb{R}^N`, which contains the intensity values for all :math:`N` *m/z* channels at that location.

Normalization
^^^^^^^^^^^^^

MassVision currently supports six spectrum normalization methods as follows:

*TIC normalization:*  
Total Ion Current normalization scales each spectrum by the sum of its intensity values. For each pixel, the normalization is defined as:

.. math::

   \text{TIC}(S_{i,j}) = \frac{S_{i,j}}{\sum_{k=1}^{N} S_{i,j}[k]}

Here, :math:`S_{i,j}[k]` denotes the intensity of the :math:`k^{\text{th}}` ion at pixel :math:`(i, j)`, and the denominator represents the total ion current at that location. This operation ensures that all normalized spectra have unit total intensity, helping to mitigate the effects of acquisition-related fluctuations and tissue heterogeneity.

*TSC normalization:*  
Total Signal Current normalization scales each spectrum by the sum of intensity values that exceed a user-defined threshold. For each pixel, the normalization is defined as:

.. math::

   \text{TSC}(S_{i,j}; \tau) = \frac{S_{i,j}}{\sum_{k=1}^{N} S_{i,j}[k] \cdot \mathbf{1}_{\{S_{i,j}[k] > \tau\}}}

Here, :math:`\tau` is the intensity threshold, and :math:`\mathbf{1}_{\{S_{i,j}[k] > \tau\}}` is the indicator function, which equals 1 when :math:`S_{i,j}[k] > \tau` and 0 otherwise. This normalization includes only signal components above the threshold in the total, reducing the influence of background noise and low-intensity fluctuations while preserving biologically relevant variation.

*Reference normalization:*  
In reference normalization, each spectrum is scaled by the intensity of a specific reference ion. For each pixel, the normalization is defined as:

.. math::

   \text{Ref}(S_{i,j}; k^*) = \frac{S_{i,j}}{S_{i,j}[k^*]}

Here, :math:`k^* \in \{1, \dots, N\}` is the index of the chosen reference ion (corresponding to a particular *m/z* value), and :math:`S_{i,j}[k^*]` is its intensity at pixel :math:`(i,j)`. This normalization preserves relative ion abundances while anchoring the scale to a biologically or experimentally relevant signal. The reference ion should be consistently present across spectra and stable in intensity to ensure reliable scaling.

*Statistical scaling:*  
This is a family of normalization methods in which each spectrum is scaled by a scalar summary statistic of its intensity values. For each pixel, the normalized spectrum is defined as:

.. math::

   \text{SCALE}_f(S_{i,j}) = \frac{S_{i,j}}{f(S_{i,j})}

Available options in MassVision for :math:`f: \mathbb{R}^N \rightarrow \mathbb{R}_{>0}` include:

* *Mean normalization:* :math:`f(S_{i,j}) = \frac{1}{N} \sum_{k=1}^{N} S_{i,j}[k]`

- *Median normalization:* :math:`f(S_{i,j}) = \operatorname{median}(S_{i,j})`

- *RMS normalization:*  :math:`f(S_{i,j}) = \sqrt{\frac{1}{N} \sum_{k=1}^{N} S_{i,j}[k]^2}`

This formulation supports flexible normalization strategies: for example, median normalization offers robustness to outliers and noise, while RMS normalization emphasizes higher-intensity signals and better reflects spectral energy.

Pixel aggregation
^^^^^^^^^^^^^^^^^

Spatial denoising can be achieved by applying a local aggregation operation over a sliding kernel across each ion image. A square kernel of side length :math:`w`, with a symmetric stride (pitch) :math:`s` in both spatial directions, is applied independently to each ion image. For all integer indices :math:`i` and :math:`j`, the stride-aligned kernel center is given by:

.. math::

   (x', y') = (i \cdot s,\, j \cdot s),

and the output value at that location is computed as:

.. math::

   \hat{I}_k(x', y') = \underset{(u,v) \in \mathcal{N}_w(x', y')}{\text{AGG}} \, I_k(u, v)


where :math:`\mathcal{N}_w(x', y')` denotes the :math:`w \times w` neighborhood centered at :math:`(x', y')`. The aggregation operator :math:`\text{AGG}` can be instantiated by a user-defined function *f* such as *min*, *max*, *sum*, or *mean*, each computing a scalar summary over the specified neighborhood.